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Many congratulations to Professors Xia and Tong 
for another stimulating paper initiated from their 
own creative thinking. The base point of the pro- 
posed approach is the fact that most, if not all, sta- 
tistical models are wrong. This not only applies to 
time series models, as a statistical model is, hope- 
fully, a simplified representation of the truth. At the 
best it catches some features of the unknown under- 
lying population. While the understanding of this 
nature is within the common wisdom, most statisti- 
cal inference methods are confined to the framework 
which assumes that the true model is a member of 
the family of models concerned. The approach ad- 
vocated in this paper acknowledges explicitly that 
the assumed model is not the truth, and indeed it 
is advantageous sometimes not to read too much 
into the assumed model. For example, the authors 
have articulated elegantly that if our interest lies in 
catching the linear dynamical structure, we should 
not use the (Gaussian) maximum likelihood estima- 
tion which effectively minimizes the one-step-ahead 
prediction errors only, and in fact a better fitted au- 
tocovariance is resulted from minimizing up to m- 
step-ahead predictions for m > 1. 

Following the lead of the authors, it seems to make 
sense to take on board the concern for "wrong mod- 
els" at the stage of the model selection, too, as hinted 
at the end of the paper. In the way, this has been 
actively researched in the context of model selec- 
tion. However, a difference here is to use a different 
measure for "goodness of fit" instead of likelihood 
(or log-likelihood). Let us consider a simple case: 
fit a linear AR(p) model to observations y± , . . . , y n 
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from a stationary time series with mean 0, where 
the order p is to be determined by the data, too. 
Let y t , p = (y t ,y t -i, ■ ■ .,y t -p+i)'. Based on an AR(p) 
model (with independent innovations), the best pre- 
dictor at the time t for a future value yt+ m should 
be a linear combination of the p components of yt,p- 
In fact the best linear predictor based on y t;P is 
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where T p is a p x p matrix with *y(j — i) as its 
(i,j)th element, 7 m „ is a p x 1 vector with 7(771 + 
i — 1) as its ith element, and j(-) denotes the auto- 
covariance function of yt- In fact (1) holds for any 
stationary process. However, if we fit yt with an 
AR(p), its autocovariance function 7(-) is then de- 
termined by p — the parameters in an AR(p) model. 
Put a m>p = OL m>p (Gp). Then y' tp cx m ^(0p) is the best 
predictor for yt+m based on an AR(p) model. Us- 
ing the "matching up-to-m-step-ahead point predic- 
tions" approach of Section 2.1, we estimate 9 p (for p 
given) by 



p = argmin(2p(0p), 
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However, we cannot choose p by minimizing Q p (0 p ), 
as Qp(Op) is likely to decrease as p increases. 

To appreciate the difficulties involved, let us first 
consider the "ideal world" where the (true) distribu- 
tion of {yt} is known. Then we should estimate p 
by 

P = argminQ*(0 p ), 
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Unfortunately Q p (0 p ) still decreases as p increases. 
The information (e.g., the variance) of the noise 
component of yt is required in order to know when 
to stop. This is the standard problem in model selec- 
tion even for linear regression. One way to get away 
from this requirement is to take the log-transforma- 
tion. Namely, we define 

L*(p)=log{Q;(0 p )} 

= lo 4 ^ E E ^ ~ y't, P a ^p)} 2 } \ • 
I k=i J 

When p is in the range on which Q P {0 P ) varies slowly 
(with respect to p), it holds that 

l*(p) - l*(p+i) « QW-Qj+iOW). 
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Intuitively we would like to choose the smallest p 
such that the decrease L*(p) — L*(p+ 1) is smaller 
than an appropriate but unknown constant. In prac- 
tice, we may use L(p) = log{Q p (6 p )} to replace L*(p), 



and choose p to minimize 

L(p) + E{L*(p)-L(p)}. 

This is in the same spirit of AIC in the sense that the 
bias E{L*(p) — L(p)} serves as the penalty for the 
model complexity. When the true model of yt is not 
AR, this bias does not admit a simple asymptotic 
expression such as AIC even when m = 1; see, for 
example, Konishi and Kitagawa (1996). One may 
also consider to develop some resampling estimates 
for this bias. 

The above line of thinking is provoked from read- 
ing this interesting paper which will serve as an in- 
spiration for further research in tackling the issues 
related to the lack of a true model. Then one may 
quibble over the use of the phrase "catch-all ap- 
proach." If a model could catch all the features, it 
should be the true model, or at least pragmatically 
so. One message from the paper is that one should 
fit (and perhaps also choose) a model according to 
a specified purpose in hand, and a good statisti- 
cal modeling is to catch the features of interest for 
a particular purpose. 
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